Bernoulli 17(2), 2011, 592-608 
DOI: 10.3150/10-BEJ295 

't-h" : Stochastic comparisons of stratified sampling 

o ; techniques for some Monte Carlo estimators 

P^! LARRY GOLDSTEIN^, YOSEF RINOTT^ and MARCO SCARSINI^ 

'^U( ' ^Department of Mathematics, University of Southern California, Kaprielian Hall, Room 108, 

pg I 3620 Vermont Avenue, Los Angeles, CA 90089-2532, USA. E-mail: larry@math.usc.edu 

r<\ ■ ^Department of Statistics and Center for the Study of Rationality, Hebrew University of 

Jerusalem, Mount Scopus, Jerusalem 91905, Israel and LUISS, Roma, Italy. 

' \ E-mail: rinott@mscc.huji.ac.il 

P^ . '^ Dipartimento di Scienze Economiche e Aziendali LUISS, Viale Romania 12, 1-00197 Roma, 

^aJ ' Italy and HEC, Paris, France. E-mail: marco.scarsini@luiss.it 

^ ' We compare estimators of the (essential) supremum and the integral of a function / defined 

Son a measurable space when / may be observed at a sample of points in its domain, possibly 
, , ' with error. The estimators compared vary in their levels of stratification of the domain, with the 

result that more refined stratification is better with respect to different criteria. The emphasis 
^S] ' is on criteria related to stochastic orders. For example, rather than compare estimators of the 

^ ' integral of / by their variances (for unbiased estimators), or mean square error, we attempt the 

^T , stronger comparison of convex order when possible. For the supremum, the criterion is based on 

^~~^ • the stochastic order of estimators. 

^' 
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§ ■ 1. Introduction 

^ I In many situations, the cost of computing the value of a function / is very high, because 

■ ^ ■ cither the analytic expression of the function is extremely complex or the value is the 

r> ' result of a costly experiment. For example, / could be the level of toxicity as a reaction 

jrt ' to different doses of certain drugs, the output of a chemical experiment, or the survival 

time of a patient undergoing a certain treatment. Therefore the function can be computed 

only at a limited number of points. One standard way to choose these points is via some 

Monte Carlo randomization. Different possibilities arise: points could be sampled totally 

at random or some stratification could be used. When properly carried out, stratification 

is known to improve the performance of estimators. The purpose of this paper is to qualify 

the above statement in some relevant cases and compare different sampling stratifications 

according to some suitable criteria. 
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Often the object of interest is some functional of /, such as its supremum or integral. 
Monte Carlo estimation of such functionals is the subject of a very large number of 
papers. In most cases some regularity of the function / is assumed; see, for example, 
[18, 26]. Under some regularity conditions it is often reasonable to estimate the entire 
function and then use a plug-in method to estimate the functional. When no regularity 
is assumed for /, then it may be more reasonable to estimate the functional directly. 

Given a measurable space (il,W), let /:ii— >-R be a measurable function /. In order 
to estimate 9 := swp^^^j f{x), we can draw a sample Xi, . . . ,Xn of n points in il and 
use the estimator T := niax{f(Xi), . . . , /(X„)). Alternatively wc can sample the X's by 
resorting to some stratification. Ermakov, Zhiglyavskii and Kondratovich [6], Kondra- 
tovich and Zhigljavsky [11] and Zhigljavsky and Zilinskas [25] prove that, if we consider 
two partitions of il, one of which is a refinement of the other, and we sample in pro- 
portion to the measure of each element of the partition, then the more refined partition 
produces a stochastically larger estimator of the supremum. Since these estimators are 
almost surely smaller than 9 (hence biased) and consistent, the stochastically larger one 
performs better. Thus, the more we stratify, the better the estimator we obtain. 

In our paper we extend this result and show that the stochastic comparison for esti- 
mators of the supremum holds also when observations arc censored, that is, when for a 
sample of pairs of random variables (Ui, Zi) we only know whether Zi < f(Ui) or not. In 
applications, there may be situations where exact evaluation of f{u) at a given point is 
difficult or expensive, whereas a comparison of f{u) to a given constant t is (at least for 
most values of t) much easier. For example, if f{u) represents a lifetime, it may be easier 
to see if it has exceeded a certain value, rather than wait to obtain the exact value f{u) 
itself. This amounts to censoring. 

When we want to estimate the integral /(/) of the function /, then it is easy to con- 
struct an unbiased estimator of /(/) by using different stratified samples. Unbiasedness 
of these estimators implies that the comparison criterion cannot be the stochastic order, 
as used for the maximum. 

In much of the literature estimators are compared in terms of a given loss function, 
which may be arbitrary. Typically the loss function is quadratic, so the criterion is the 
mean square error, that is, the variance, when the estimator is unbiased. More generally, 
it may be possible to find comparison criteria that arc valid for large classes of loss 
functions; for instance, all losses of the type \W — I{f)\^, where W is an estimator of 
/(/) and p>l, or even the class of all convex loss functions. The use of the entire class 
of convex loss functions in inference goes back at least to [13] and [14]. Similar ideas were 
later used by Berger [2], Kozek [12], Lin and Mousa [15], Eberl [5], Bai and Durairajan 
[1] , and Petropoulos and Kourouklis [20] . A comparison of the performance of different 
estimators, with respect to all convex loss functions, can be achieved by considering the 
convex order. Comparison of experiments in terms of the convex order traces back to 
[3,4]. 

It is well known that stratification reduces the variance of estimators of /(/), but, as 
will be shown below, stratification docs not necessarily reduce K[\W — /(/)]p], for p ^ 2, 
which implies that, even if stratification is useful in L2 , it may be counterproductive in Li . 
We will show that in some circumstances stratified sampling is better not just in L2, but in 
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terms of the convex order, which in turn imphes that it is better in Lp for every p > 1. This 
is the case when observations are censored, the function / is univariate and monotone, 
or the function is multivariate and monotone and the samphng is independent across 
coordinates. Papageorgiou [19] shows the computational advantage of using randomized 
methods to compute the integral of monotone d-variate functions, and shows how this 
depends on d. 

Our results also hold when the function / can only be observed with noise; for in- 
stance, when / is observed as the outcome of some experiment. Moreover, our regularity 
assumptions on the function / are rather non-restrictive: measurability when estimating 
the maximum, boundedness when observations are censored, and sometimes monotonic- 
ity when estimating the integral. 

We emphasize that, in our framework, evaluation of / by experiment is the costly part 
and any precalculations, such as those required for computing strata and sampling from 
the conditional distributions in strata, even if computer-time consuming, are considered 
to have a relatively negligible cost. 

The paper is organized as follows. Section 2 fixes notation and reviews various proper- 
ties of stochastic orders and certain dependence structures. Section 3 compares estimators 
of the supremum of a function, considering also the case of censored observations. Sec- 
tion 4 compares estimators of integrals: First a variance comparison is shown to hold 
in general, even when observations are affected by errors. Then a counterexample is 
provided for a non-quadratic loss function. Then censored observations are considered 
and a comparison in terms of the convex order is proved in this case. Finally, monotone 
functions are examined. In the univariate case, a convex order comparison holds. In the 
multivariate case, this is true under some additional conditions on the stratification and 
on the dependence of the underlying random vector. 

Numerical examples can be found in [8]. 

2. Notation and preliminaries 

In this paper a probability space (17, J-", P) is assumed in the background. The stochastic 
order <st, the convex order <cx, the increasing convex order <icx, and the majorization 
order -< are defined as follows (see, e.g., [16, 17, 24]). Given two random vectors X, Y, 
we say that Y <st X if 

E[0(Y)]<E[0(X)] (2.1) 

for all non-decreasing functions 4>. We say that Y <cx X if (2.1) holds for all convex 
functions (j) a-nd Y <icx X if (2.1) holds for all non-decreasing convex functions cf). It is 
well known that Y <st X iff P(Y G A) < P(X G A) for all increasing sets A, where we 
call a set increasing if its indicator function is non-decreasing. In the case of univariate 
random variables X, Y, the above inequality becomes P(F < t) > P{X < t) for all t G M. 
It is weh known that X <cx Y implies E[X] = E[Y] and Var[X] < Var[r]. 

The statement Y <st X depends only on the marginal laws C{Y) and CpC), so some- 
times we write >C(Y) <st 'C(X), and analogously for <cx and <icx- 
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Given two vectors x = (xi, . . . ,a;„), y = (?;i, . . . , j/„), we write y ^ x if 

k k n n 

'^yf<'^xf for fc=l,...,n-l, '^yi = "^xi, 

i—l 2—1 i— 1 z— 1 

where j/i > • • • > j/i is the decreasing rearrangement of y, and analogously for x. The 
relation y -< x holds if and only if there exists an n x n doubly stochastic matrix D such 
that y = Dx. 

A function ip : R" ^ M is called Schur convex or Schur concave if y -< x implies ip{y) < 
■0(x) or ip{y) > ip{x.), respectively. If (^ : R — > M is convex then '0(x) = X]"=i 'fii^i) is Schur 
convex. 

A random vector X is associated if for all non-decreasing functions 0, ip we have 
Cov[</)(X),V(X)]>0. 

Recall that a subset A C K'' is a lattice if it is closed under componentwise maximum 
V and minimum A. A random vector X is multivariate totally positive of order 2 (MTP2) 
if its support is a lattice and its density /x with respect to some product measure on 
W^ satisfies /x(s)/x(t) < /x(s V t)/x(s A t) for aU s,t gM''. MTP2 implies association. 
Also, any vector having independent components is MTP2. 

Let [/ be a random variable with values in some measurable space (IX, W) with non- 
atomic law Pjj. A finite sequence B = {Bi, . . . , Bb) of subsets of U is called an ordered 
partition of il if Bi D Bj = for i,j £ {1, . . . , 6}, i j^ j, and lJi=i ^i = ^- For the sake of 
brevity in the sequel, whenever we say "partition" we mean "ordered partition." 

Here wc consider partitions B = {Bi, . . . ,Bi,) of il, where the sets Bi arc measurable 
and such that for z = 1, . . . , fe we have P{U G Bi) = ki/n for some fc^ G {1, . . . , n} satisfying 
J2i h ^ n. We say that such a partition S of it and a partition B* = (S*, . . . , i?^) of 
N := {1, . . .,n} are associated if the cardinalities \B* \ of the sets B* satisfy \B* \ = ki for 
i = 1, . . . , 6. We then have 

PiUeB,)^^-^. (2.2) 

n 

The notation B (LB means that B is one of the sets Bi that comprise B and, given B E B, 
we let B* denote the corresponding set B* in B* such that (2.2) holds. 

Given two partitions B* = {Bl,..., B*) and C* = (Q , . . . , C*) of N, we write C* <rcf . 
B*; that is, that B* is a refinement of C* when every set in C* is the union of sets in B* . 
We will use the same order <ref. for partitions of il. Clearly, if C and B arc partitions of il, 
each of which can be associated to some partition of N , then C <rGf . B implies that there 
exist partitions C* and B* associated to C and B, respectively, satisfying C* <,.of. B* . 

Call A* = ({1}, . . . , {n}) the finest partition of N and V* = (N) the coarsest partition 
of N. Then V* <ref. B* <rcf. A* for aU B*, and for any partition y^ of il associated to A* 
we have F{U G A,) = l/n. 

For a partition B and B E B, let -P[/|b denote the conditional law of U given U € B. Let 
{V^ ,j G B*} be random variables with law Pu\b with {V^ ,j G B*,B G B} independent. 
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3. The supremum 

Let /:iI^>M be measurable, and define 

W^ = max max f{Vf), (3.1) 

where the subscript S indicates that Wg will be used to estimate the (essential) supre- 
mum of the function /. 

Given a random variable U with values in {il,U), let /* := esssup/(t/). It is clear 
that for any choice of partition B, F{Wg </*) = !. The following result compares two 
estimators of type W^ . Since both estimators underestimate /*, the stochastically larger 
one is preferable. This theorem, which goes back to [6] and [11], can also be found in 
[25], Theorem 3.4. 

Theorem 3.1. IfC<,-ci. B, then W^ <st W^ . 

A short proof of Theorem 3.1, different from the one in the [25], can be found in the 
Appendix. 

As mentioned in the Section 1, data are not always observed exactly in many practical 
situations, but may be censored for various reasons, including budget constraints. We 
extend now the comparison result of Theorem 3.1 to the case of censored observations. 
Let / :il — T- M be bounded; without loss of generality, we take < /(u) < 1 for all u G it. 
In this section we assume that, for a sample of points of the type {u,t) G it x [0, 1], we 
are allowed to observe only the value of t and whether t > f{u). 

For any partition B with associated partition B*, let {K^, j G S*}, B e B and {Tj,j S 
N} be independent random variables with law Pu\b and the uniform distribution on 
[0, 1], respectively, and let 



S^=[j {j e B* : T, < fiVf)} and W^s = maxT,-. 
Bee ^^^ 

When S''^ = we set W^^s ~ ^- ^^'^ letter C in the subscript CS indicates censored data. 
It is clear that P(VFcg < /*) = 1, so the estimator W^g underestimates /*. 

Theorem 3.2. //C <rcf. B, then W^^ <st W^g. 

Proof. Below, when we write V^ without specifying B, we mean that B Q B corresponds 
in the sense of (2.2) to the set B* G B* , which contains the index j. For any t G [0, 1], we 
may calculate the distribution function of Wq^ at t by writing 



{wEs<t}= U { n^|$ ^3 <t.S''^ r] 

RCN ^^ 

= U {T, < t, T, < f{Vf) for all j G R, and T, > f{Vf) for all j i i?,} 



RCN 
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= [j {Tj<tA f{Vf) for all j e R, and Tj > f{Vf) for all j ^ R}. 



(3.2) 



RCN 

Hence, conditionally on {V^ , j ^ B* , B <E B}, using the fact that the Tj's are uniform, 
we obtain: 

nwEs<t\Vf,jeB*,BeB) 

= E n p(^^- ^ ^ ^ fi^fy) n ^i^, > nvf)) 

RCNjeR j^R 

= E n(*^/(^/))n(i-/(^'')) 

RCNjeR j(^R. 

= E--E E n(^A/(T/-))n(i-/(^^)). 

/ii = l h5 = l RCN jCR j^R 

Vi,\RnB^\=hi 

Taking expectation we obtain the unconditional distribution, 

i^ii i^^'i '' /\u*\\ / r ^ ''' 

hi=i hb=ii=i ^ ' ^ ^•^^' 

X ( / {l-f{u))dPulBA^)' 



n ( / (i A /(u)) dPulBiu) + / (1 - /(«)) dPc;|sH 



Let 



g^= / itAfiv))dPu\Biv)+ / (l-/(iO)dF^|B(«) 
[itAfiv)) + il-fiv))]dPuiBiv). 



If C is a union of disjoint sets i?i, then 



If C <rci. B, then 
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To see this, observe that (3.3) imphes that the vector on the left-hand side above is 
obtained from the one on the right by multiplying it by the nxn doubly stochastic matrix 
D, which is block diagonal where the «th block is the \C*\ x \C*\ matrix with all entries 
equal to 1/|C*|. Therefore, by the Schur concavity of the function (0i, . . . ,0„) i-> J|"^j 0^, 
we have 

nwSs<t) = 11(9'')"'*' > n iq''y^'^=nwSs<t)- □ 

cec BeB 

For every n G N and for every partition Bn associated to a partition B* of {1, . . . ,n}, 
we have W^S <st W^" . Therefore, 

Wg"<,tWSs<stWi"<str. 

Since W^^g" is consistent for /* as n — > oo, we have that W^g and Wg" are consistent, 
too. 



4. The integral 

With the subscript I standing for integral, let 

<-^EE/(^")' (4.1) 

BGBjGB' 

< = ^E E(/(^/)+^^-), (4.2) 

BGBjGB- 

where the variables Ej are independent copies of a random variable e having mean and 
finite variance, independent of the variables V^^ . Clearly Wf and W^ are both unbiased 
estimators of / := E[/([/)] = / /([/) dP when / |/(C/)| dP is finite, and Wf is the special 
case of Wjg when the error has zero variance; that is, there is no measurement error. 

The following result is well known when the error has zero variance (see, e.g., [7], 
Section 4.3). We extend it to a more general case, relevant when the evaluation of / is 
the result of an experiment. 

Theorem 4.1. //C <,.cf. B, then Var[Wi|] < Var[Wi'|5]. 

The proof of Theorem 4.1 can be found in the Appendix. 

It follows immediately from Theorem 4.1 that Var[Wjg] < Var[Wjg], hence, in particu- 
lar, Var[VFj'^] < Var[Wj^]. The following counterexample shows, nevertheless, that, even 
when the function is observed without error, Wj ^cx W^; that is, domination in the 
convex order does not hold. In the counterexample we consider the absolute error, that 
is, (ii), rather than mean square error, (^2)- 
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Example 4-2. Let il= [0, 1] and U have a uniform distribution on [0, 1]. Furthermore, 

let 71 = 2, Ai =:[0, 1/2], ^2 = (1/2,1]. Define 

f{u) = 4/[o,i/2] (u) + 2/(1/2^3/4] (u) + 6/(3/4,1] {u). 

Then W^ takes the values 2,3,4,5,6 with probabilities (1,4,6,4, 1)/16, respectively. 
The variable W^, based on one random observation from each of the above intervals Ai, 
takes the values 3 and 5 each with probability 1/2. Therefore, E[W^^] = 4 = E[Mf ]. 

Wc have Var[Wj^] = Var[H/j'^] = 1, but for the convex function ifi{u) = |u — 4| we have 



E[iP{W{')]^E\W{ 



V 



2^ + 2^: = T^ < 1 = JEll^i-^ - 4| = E^W,^)]. 



A more general example can be constructed as follows. Consider a partition A associ- 
ated to the finest partition A* of A^. Split Ai into two measurable subsets Aia,An, such 
that P{U G Ala) = ^{U G Alb) = 1/(2?^)- Consider now a function / defined as follows: 




if UG Ala, 

if uG Alb, 
elsewhere. 



(4.3) 



For all i G TV we have E[f{U) \UeAi]^0 and 

Var[/(C/)|C/GA,] = 
Hence 



for i = l, 
for i 7^1. 



Var[l^f^] = E[{W^^f] 
Moreover, if Vi, . . . , F„ are i.i.d. copies of U, 



VarfVl/, 



Analogously 



Therefore 



:Var 



1 " 1 " 1 

-E/(^.) =;;2EVar[m)] = -^=Var[W^i-^]. 



n 



E[\f{U)\\UeA,]^ 



j=i 



0, 



for i = 1 , 
for iy^l. 



E\Wi^\^jE[{Wi^y-] 



rA\2] - 



For any square integrable random variable Y we have E|y| < ■\/E[y2] and the inequality 
is strict if Y is not almost surely constant. Hence 



W^\ < jE[{Wf)^] = jE[{Wi^)^]^E\Wi 



1 
n 
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Example 4.2 proves that the convex order does not hold in general between estimators 
Wf and W^ when C <rcf. S- Nevertheless, in the following subsections we show that 
under some natural conditions comparisons in the convex order are possible. 



4.1. Censored observations 

Keeping the notation and spirit of Section 3, consider a function / such that < f{u) < 1 
for all u G il. Assume that for a sample of points of the type {u, t) G il x [0, 1] we are 
allowed to observe only the value of t and whether t < f{u). Let 



^ci--Xl H hT,<f{vf 



B£B]eB* 

Note that W^^ is an unbiased estimator of / = E[/(C/)], as 



n — — n — — jsijn 

BGBj&B' BGBjs^B*''^-'^ 

= T.^-^ f /(") dPuiBiu) = J2 nsMfiu) \ueB] 

B£B " •'^ BeB 

= E[/(C/)]. 
Theorem 4.3. //C <rof. B, then W^^ <cx M^ci- 
Proof. By a result in [9] (see also [16], Sections 12. F and 15. E) if 

1 " 

n ^-^ 

1=1 

where ^i , . . . , ^„ are independent Bernoulli variables with parameters pi , . . . , p„ , and p = 

{pi,...,Pn), then 

p^q implies Xq<cx^p- (4.4) 

Define 

p^=FiT, < fiVf)), p^=FiT, < fiVf)), 
and 
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i 

SO p ^ q and invoking (4.4) completes the proof. 



\c\ 



n 



Notice that in the case of censored observations, the comparison holds in the con- 
vex order, whereas in the case of perfect observation, a variance comparison holds, but 
Example 4.2 shows that comparisons in the convex order do not. 

4.2. Univariate monotone functions 



In the rest of this subsection the space il is totally ordered and, without loss of generality, 
we choose il = [0,1]. For subsets G and H of the real line, we write G < H iig <hior every 
g &G and h E H. We call a partition B = {Bi ,... , Bi,) of it monotone ii Bi < ■ ■ ■ < B^. 

Theorem 4.4. Let B and C be monotone partitions of ii and let C <rcf. B. If f is non- 
decreasing, then 



Wg<,^Wf^. 



To prove Theorem 4.4 we will apply the following lemma. 



(4.5) 



Lemma 4.5. Let ^ and rj he random variables such that ^ <st 'H, o-nd let ^i and rjj be 
independent copies of ^ and rj, respectively. Let K be an integer-valued random variable, 
independent of all ^j and rjj , satisfying K < m for some integer m and having an integer- 
valued expectation, E[i4'] =k. Then 



k m K m 

E^i+ E '?J<cxE^J+ E ''j- 

j = l j=k+l 3 = 1 j=K+l 



(4.6) 



Proof. Since ^ <st V we may construct i.i.d. pairs {^i,r]i) with P(^i < r]i) = 1 for all 
i = 1, . . .,m. We adopt the usual convention that if fc = 0, then J2i=i ^j ^ ^- First note 
that, by Wald's lemma. 



E 



E^^+ E "^^ 

U = l j=k+l . 



K 



E^^+ E ^^■ 

lj=l j=K+l . 



Therefore (see, e.g., [17], Theorem 1.5.3) it suffices to show that 

km Km 

E^j+ E %<icxE^j+ E ^j- 

3 = 1 j=k+l 3 = 1 j=K+l 
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Let (j) be an increasing convex function and set 

/ k 



Note that 



and 



g{k):^E 






g{k)=E 



K 






A' = fc 



%W]=E 



K 



E^^+ E '^^ 



Thus we have to show that g{k) < E[gi(X)]. Since E[K] = k, this follows readily by 
Jensen's inequality, once we prove that g{k) is a convex function. 

The following part of the proof follows ideas of Ross and Schcchncr [22]. Setting 



j=l 3=k+2 

we have 

g{k + 1) - g{k) = E[0(a+i + Sk)] - E[cb{vk+i + Sk)]. 
Since </> is convex, and ^a:+i < Vk+i, the function 

h{s) := E[</)(a+i + Sk) I 5fc = s] - E[0(77fe+i + 5^) | 5^ = s] 

is decreasing in s. Now note that 

k-\-l m km 



s'fc+i=E?'+ E ^/» <st s'fc = ^ c» + E '^*' 



2—1 i — /C + S 



1=1 j:=A:+2 



because ^fc+i <st ??fc+2- Hence ^(fc + 1) — g{k) — E[/i(5fc)] is increasing in fc, thus proving 
that g is convex, as required. D 

Proof of Theorem 4.4. Since B — (Bi, . . . , Bh) and C = (Ci, . . . , Cc) are monotone 
partitions satisfying C <rcf . -B, there exist 1 = ii < ^2 < • • • < ic < V+i = f' + 1 such that 

iq + l-l 

C, = IJ Bj for (? = l,...,c. 
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As the union above may be formed by taking the union of two consecutive sets at a time, 
it suffices to prove (4.5) for the case where c = b — 1, C,„ = B„i U Bm+i, Ck = Bk for 
fc e {1, . . . ,m — 1}, and Ck = B^+i for k G {m + 1, . . . ,c}. 
In this case we have 



W, 



1 



IE 



W^IE = 



'E E/(^")+E/(^"'")- 



E /(^"""^^) + E^. 






jew 



Note that 




where K is binomially distributed with parameters 

I R* 

It is easy to see that if two variables are ordered by the convex order (see (2.1)) and we 
add the same independent variable to each one, to wit, X^iew ^i' then the convex order 
is preserved. This fact and Lemma 4.5 now yield (4.5). D 



4.3. Multivariate monotone functions 



In this section we extend the results in Section 4.2 to the multivariate case. When we 
consider multivariate monotone functions, stratifying can still yield improvement in the 
convex order, but some restrictions are needed, both on the distribution of the ran- 
dom vector used for sampling and on the stratifying partitions. More specifically, we 
consider estimation of an integral with respect to a random vector whose components 
are independent and under a stratification that preserves independence on each set of 
the partition. The result we prove below actually only requires that the random vector 
have an MTP2 distribution (independence being a particular case of it) and that the 
stratification preserves MTP2. 

Let / : [0, 1]'' — ^ [0, 1] be non-decreasing in each variable and let U be a random vector 
taking values in [0, 1]'' with a non-atomic distribution. Our goal is to show that the esti- 
mate of E[/(U)] improves by refining stratifications as follows. Recalling the definitions 
in Section 2, start with a partition C = (Ci, . . . , Cb) of [0, 1]'' such that for some i the 
distribution C{\5 | U S Ci) is associated. Then split Ci into CiOG and CiOC^, where 
G is an increasing set. Lemma 4.8 below shows that the new partition obtained by this 
splitting achieves a better estimator of the integral in terms of the convex order and 
Theorem 4.6 provides some conditions for its application. 
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Theorem 4.6. Consider a partition C ~ (Ci, . . . , Cc) of [0, 1]'' where each Ci is a lattice. 
Let B he a partition obtained by a sequence of refinements C = Ci <ref . • • • :^ref . Cm = B, 
such that for fc = 1, . . . , ttt, — 1 the partition Ck+i is obtained from Ck by splitting one set of 
Ck, say Ci^^k, into C^^^k'^Gk andCi^^f^Gl, where G^ = {x = (xi, . . . ,Xd) S [0,Vf:ak < 
Xj} for some Ok G [0, 1] and some j G {I,. . .,d}. 

If U is MTP2 on [0, l]'^ and / : [0, l]"* ^ [0, 1] is non- decreasing, then W^j| <cx Mf^ • 

As mentioned earlier, independence is a particular (and in our framework the most 
important) case of MTP2. Independence makes simulation of a multivariate random 
vector easy, even when conditioned on an interval, since the strata can be constructed 
by knowing only the quantiles of the marginal distributions. If the cost of simulation is 
negligible relative to the cost of evaluating /, then even rejective sampling can be used, 
once the strata are defined. 

The proof of Theorem 4.6 is preceded by the following lemmas. 

Lemma 4.7. //U is an associated random vector, and G is an increasing set, then 

/:(u|ueG")<,t/:(u|UGG). (4.7) 

Conversely, if (4-7) holds for every increasing set G, then U is associated. 
Proof. First note that (4.7) is equivalent to 

P(U e .4|U e G) > P(U e ^|U e G") 

holding for all increasing sets A. The latter inequality is easily seen to be equivalent to 

P(u e yl n G) [1 - P(u G G)] > [P(u e A) - P(U e A n G)]P(U e G). 

By simple cancelation this inequality is equivalent to 

P(U e A n G) > P(u e ^)P(U e G), 

which is equivalent to association of the random vector U by, e.g., Shaked [23]. D 

Lemma 4.8. Consider a partition C ~ (Gi,...,Gc) of [0,1]'' such that for some Gi 
the distribution C{\J \ U G Ci) is associated. Let G be an increasing set and let B = 
(Gi, . . . , Gj_i, Gi n G, Gi n C, Gi+i, . . . , Gc). // / : [0, 1]'' -^ [0, 1] is non- decreasing, then 

Proof. With C{Vi) = C{U | U G G, n G=) and ^(Va) = C{IJ | U e Gi n G), Lemma 4.7 
yields Vi <st V2. The monotonicity of / implies f(Vi) <st /(V2), and Lemma 4.5 now 
proves the claim, applying arguments as in the proof of Theorem 4.4. D 

The following result can be found in [10]. 
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Lemma 4.9. // an MTP2 vector U takes values in a lattice of which C is a suhlattice, 
then £(U | U G C) is MTP2 and hence associated. 

The following corollary is obvious, and only requires the fact that the intersection of 
sublattices is a lattice. 

Corollary 4.10. // an MTP2 vector U takes values in some lattice, and C , G and C^ , 

are all sublattices, then both C{\5 | U e C n G) and £(U | U e C n G^^) are MTP2, and 
hence also associated. 

Proof of Theorem 4.6. We first prove by induction that £(U | U £ Ci^k) are MTP2 for 
all Ci^k G Ck and fc = 1, . . . , m. For k~l this follows from Lemma 4.9 and the assumptions 
that U is MTP2 and that Gj = Gi^i are sublattices of [0, 1]''. Assuming the statement 
true for 1 < k <m, to verify that it is true for fc + 1 we need only show that £(U | U S 
C^^,k n Gk) and C(IJ \ U e G^^^k n G^J are MTP2, which follows from Lemma 4.9, thus 
completing the induction. 

Hence, again using Lemma 4.9, £(U | U G Gi^.fe) is associated. Since Gk is increas- 
ing. Lemma 4.8 now yields Wjg+^ <cx W^ie' for all fc = 1, . . . , m — 1, and, therefore, the 
theorem. D 

A sequence of partitions as in Theorem 4.6 can be generated as follows: start with the 
whole space [0,1]'^, then split it into boxes by repeatedly subdividing one element of the 
partition by an intersection with some G and G'^. In [0,1]^, the resulting partition forms 
a tiling of the square by rectangles. Note that from the first step, a sequence of partitions 
created using G as above has at least one line that crosses the whole square from side to 
side. Therefore the tiling of Figure 1 is not attainable by such a sequence. 

Finally, recall that the hypothesis of MTP2 includes as a particular case the uniform 
distribution on [0, l]'^, so Theorem 4.6 applies to the estimation of the integral / /(u) du 
on [0, l]**, or any lattice. 



Figure 1. Non-attainable tiling. 
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Appendix 

Lemma A.l. Given a partition B* of N, consider a collection of independent random 
variables {^^ }, B* £ B* , j ^ B* , with those indexed by the same element B* of the 
partition being identically distributed. 

For C* <rof. B* , let {^j^ } with C* G C* and j G C* be a collection of independent 
random variables with the mixture distribution 



\c* 



Then 




max niax£^ <=* max maxf. . (A. 2) 

C'eC'jec ■' B'eB' ]i£B'^ 

Proof. Let p^' = P(Cf' < t) for B* e B* and p^ 
Wc claim that 

(p^i, . . . ,p^S . . . ,p^s . . .,p^') -< (p^i 

\C',\ |C-| 

To see this, observe that (A.l) impHes that the vector on the left-hand side above is 
obtained from the one on the right by multiplying it by the n x n doubly stochastic 
matrix D, which is block diagonal where the ith block is the \C*\ x \C*\ matrix with all 
entries equal to 1/|C*|. 

Hence, by the Schur concavity of the function {9i,. . . , On) i—> nr=i ^'' ^^ have 

Pfmax max£f* <f') = TT (p'^*)''^'! > TT (p^*)'^*' =Pf max max^f * < i 
Kc-ac jdC'^ ~ I 11'^ -11 ^-f^ \B'eB* jeB*^ ~ 

C'GC* B'eB* 

which is equivalent to (A. 2). D 

Proof of Theorem 3.1. Let B* and C* be partitions associated with B and C, respec- 
tively, satisfying C* <rcf. B*, and let {^f *,S* GB*,j€B*} and {^f ,C* e C*,j G C*} 
be collections of independent random variables with distributions 

P(^f' <t)= f{f{U) <t\UeB), 
V{S.f <t)= n.f{U) <t\UeC). 

Then (A.l) holds (law of total probability), and the result follows by Lemma A.l. D 

Proof of Theorem 4.1. hi what follows we consider conditional expectation with re- 
spect to a partition. Though the notion is standard, specifically, by E[/(t/) + £\B], we 
mean the random variable that takes values fg := E[/([/) \ U <E B] with probability 
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\B*\/n. Then 

Var[/([/) + e\B] = E[{f{U) + e - E[,f{U) + e\B]y\B] 
= E{{f{U)+e-E[f{U)\B]y\B] 

is a random variable taking values E[(/(t/) +e — fg)^ \U £ B] with probability \B*\/n, 
and 

E[Var[/(C/) + e\B]] = ^ ^E[(/([/) + e - J^f \UeB] 
BeB 



= i^|S*|E[(/(V^i^) + e-/sf 



71 

BeB 



ivar 
n 



BeBjeB* 



= nVar[Tyi|]. 

If C <ict. ^, then for any random variable Y, say, Var[E[F|,S]] > Var[E[y |C]] by Jensen's 
inequality, and now the usual variance decomposition of Y (see, e.g., [21], Theorem 13.3.1) 
implies E[Var[y|B]] <E[Var[r|C]]. Therefore 

E[Var[f{U) + e\B]] < E[Var[/(t/) + ejC]], 

and hence 

Var[l^i|] = -E[Var[/([/) + e\B]] < -E[Var[/(C/) + e\C]] - Var[<E]. □ 

n n 

Acknowledgements 

We thank Abram Kagan for sparking our curiosity in the topic with a simple version 
of Theorem 3.1, Erich Novak for an important bibliographical reference, and Pierpaolo 
Brutti for his help with R. We are indebted to the editor, an associate editor and three 
referees for their accurate reading of the paper and their helpful comments. The work of 
Yoscf Rinott is partially supported by the Israel Science Foundation grant No. 473/04. 
The work of Marco Scarsini is partially supported by MIUR-COFIN. 

References 

[1] Bai, S.K. and Durairajan, T.M. (1997). Optimal equivariant estimator with respect to 
convex loss function. J. Statist. Plann. Inference 64 283-295. MR1621618 

[2] Berger, J.O. (1976). Admissibility results for generalized Bayes estimators of coordinates 
of a location vector. Ann. Statist. 4 334-356. MR0400486 



608 L. Goldstein, Y. Rinott and M. Scarsini 

[3] Blackwell, D. (1951). Comparison of experiments. In Proceedings of the Second Berkeley 

Symposium on Mathematical Statistics and Probability, 1950 93-102. Berkeley and 

Los Angeles, CA: California Univ. Press. MR0046002 
[4] Blackwell, D. (1953). Equivalent comparisons of experiments. Ann. Math. Statist. 24 

265-272. MR0056251 
[5] Eberl Jr., W. (1984). On unbiased estimation with convex loss functions. Statist. Decisions 

1984 177-192. MR0785208 
[6] Ermakov, S.M., Zhiglyavskii, A. A. and Kondratovich, M.V. (1988). Reduction of a problem 

of random estimation of an extremum of a function. Dokl. Akad. Nauk SSSR 302 

796-798. MR0983943 
[7] Glasserman, P. (2004). Monte Carlo Methods in Financial Engineering. New York: Springer. 

MR1999614 
[8] Goldstein, L., Rinott, Y. and Scarsini, M. (2010). Stochastic comparisons of stratified 

sampling techniques for some Monte Carlo estimators. Technical report. Available at 

arXiv:1005.5414vl [math. ST]. 
[9] Karlin, S. and Novikoff, A. (1963). Generalized convex inequalities. Pacific J. Math. 13 

1251-1279. MR0156927 
[10] Karlin, S. and Rinott, Y. (1980). Classes of orderings of measures and related correla- 
tion inequalities. 1. Multivariate totally positive distributions. J. Multivariate Anal. 10 

467-498. MR0599685 
[11] Kondratovich, M. and Zhigljavsky, A. (1998). Comparison of independent and stratified 

sampling schemes in problems of global optimization. In Monte Carlo and Quasi-Monte 

Carlo Methods 1996 (Salzburg) 292-299. New York: Springer. MR1644527 
[12] Kozek, A. (1977). Efficiency and Cramer-Rao type inequalities for convex loss functions. 

J. Multivariate Anal. 7 89-106. MR0431482 
[13] Laycock, P.J. (1972). Convex loss applied to design in regression problems. J. Roy. Statist. 

Soc. Ser. B 34 148-170, 170-186. MR0350935 
[14] Laycock, P.J. and Silvey, S.D. (1968). Optimal designs in regression problems with a general 

convex loss function. Biometrika 55 53-66. MR0225446 
[15] Lin, P.E. and Mousa, A. (1982). Proper Bayes minimax estimators for a multivariate normal 

mean with unknown common variance under a convex loss function. Ann. Inst. Statist. 

Math. 34 441-456. MR0695065 
[16] Marshall, A.W. and Olkin, I. (1979). Inequalities: Theory of Majorization and Its Applica- 
tions. New York: Academic Press. MR0552278 
[17] Miiller, A. and Stoyan, D. (2002). Comparison Methods for Stochastic Models and Risks. 

Chichester: Wiley. MR1889865 
[18] Novak, E. (1988). Deterministic and Stochastic Error Bounds m Numerical Analysis. Berlin: 

Springer. MR0971255 
[19] Papageorgiou, A. (1993). Integration of monotone functions of several variables. J. Com- 
plexity 9 252-268. MR1226312 
[20] Petropoulos, C. and Kourouklis, S. (2001). Estimation of an exponential quantile under 

a general loss and an alternative estimator under quadratic loss. Ann. Inst. Statist. 

Math. 53 746-759. MR1880809 
[21] Rosenthal, J.S. (2006). A First Look at Rigorous Probability Theory, 2nd ed. Hackensack, 

NJ: World Scientific Publishing. MR1767078 
[22] Ross, S.M. and Schechner, Z. (1984). Some reliability applications of the variability ordering. 

Oper. Res. 32 679-687. MR0756013 



Stratified sampling for some Monte Carlo estimators 609 

[23] Shaked, M. (1982). A general theory ol some positive dependence notions. J. Multivariate 

Anal. 12 199-218. MR0661559 
[24] Shaked, M. and Shanthikumar, J.G. (2007). Stochastic Orders. New York: Springer. 

MR2265633 
[25] Zhigljavsky, A. and Zilinskas, A. (2008). Stoctiastic Global Optimization. New York: 

Springer. MR2361744 
[26] Zhigljavsky, A. A. and Chekmasov, M.V. (1996). Comparison of independent, stratified and 

random covering sample schemes in optimization problems. Math. Comput. Modelling 

23 97-110. MR1398005 

Received April 2009 and revised May 2010 



